-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new collector exposing 'ksmd' stats #165
Conversation
Could you also add fixtures, a few tests, and wire it into the end-to-end test? For the latter, add the collector to the list of collectors in the script, and run the script with |
prometheus.CounterOpts{ | ||
Namespace: Namespace, | ||
Subsystem: ksmdSubsystem, | ||
Name: "full_scans", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention, counters should be suffixed by _total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brian-brazil While this is definitely a recommended practice per Prometheus docs, I thought that node_exporter was some kind of "special" exporter and followed the rule "do not change underlying OS metric name". And a quick search through the sources confirmed this thought.
If this is not true anymore and for new collectors '_total' suffix should be used for counters regardless of the underlying metrics name then, sure, I will fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#150 covers fixing this generally, you should add _total
where it's practical to do so and clear that the metric is a counter.
@matthiasr Yeah, sure. I'll add some tests. |
@matthiasr I've added ksmd collector to end-to-end test. Regarding the unit tests - I'm not sure they are needed. There is no complex structure in those sysfs files, they are just a bunch of files exporting one integer value each. |
Ping, it looks like this needs a rebase. |
Rebased onto master. |
|
||
var ( | ||
ksmdFiles = []string{"full_scans", "merge_across_nodes", "pages_shared", "pages_sharing", | ||
"pages_to_scan", "pages_unshared", "pages_volatile", "run", "sleep_millisecs"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this collector already explicitly defines the list of files/metrics to export, I wonder whether we should also go for seconds instead of millisecs here? Wdyt @brian-brazil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be seconds. It looks like we can also convert pages to bytes as this feature doesn't support huge pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will rename sleep_millisecs to sleep_seconds. Regarding the conversion of pages to bytes - I'm note sure that this is a good idea. First of all, ksmd operates exclusively on pages. And while having metrics like pages_shared, pages_sharing and so on in bytes look good (one might want to see real memory savings in bytes), converting pages_to_scan (which is a setting, not a metric) to bytes just looks weird and confusing.
If someone wants to have bytes instead of pages, this can easily be done with Prometheus query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We convert pages to bytes where possible everywhere else, so that the users don't have to figure out how to convert from a multitude of different units to get bytes.
It's probably best to exclude pages_to_scan, that doesn't seem like something useful to export as a metric. We'd normally only expose settings if they were a limit, so that you can calculate how full something is, or something that's common to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably best to exclude pages_to_scan, that doesn't seem like something useful to export as a metric.
Actually, it is useful. It's constantly being adjusted by ksmtuned daemon (on RHEL-based systems), depending on current memory pressure. So it's useful to know this setting's value to correlate it with CPU usage and rate at which pages are being shared.
👍 Thanks and sorry for the long delay. I'll wait for @brian-brazil's comment to my question and include this in the next release (expect it this week). |
👍 for tests as they are. E2E will be enough to catch regressions. |
Add new collector which exposes the content of /sys/kernel/mm/ksm directory. This directory contains control and statistics files for Kernel Samepage Merging daemon. The collector is not enabled by default. Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com>
Add new collector exposing 'ksmd' stats
Add new collector which exposes the content of /sys/kernel/mm/ksm
directory. This directory contains control and statistics files for
Kernel Samepage Merging daemon.
This is useful to monitor hosts which run KVM hypervisor as this info provides more deep understanding of what happens to the memory subsystem on such hosts.
The collector is not enabled by default.